Fingerprinting Lexical Contexts over the Web

نویسندگان

  • Vincenzo Di Lecce
  • Marco Calabrese
  • Domenico Soldo
چکیده

In this paper a novel technique for identifying lexical contexts in web resources is presented. The basic idea is to consider web site anchortexts as lexicalized descriptions of an individual ontology organized in the form of a graph of concept words. In the search for peculiar semantic patterns, the concept of web minutia (transposed from the forensic domain) is introduced. The proposed technique consists in searching for web minutiae in the analyzed web sites by means of a golden ontology. Web minutiae act as fingerprints for context-specific web resources; in this sense they are a powerful computational tool to identify and categorize the Web. The WordNet database has been used as golden ontology for our experiments on English web documents. WordNet allows for indexing and retrieving word senses and interword taxonomical relations like hyponymy and hypernymy. It has proven to be an efficient mediator between web ontologies and context-dependent taxonomies. Our experiments have been carried out on a preliminary data set of several tens of thousand links taken by web sites of thirteen UK universities. Preliminary results seem to confirm the ability of web minutiae to identify lexical contexts across the Web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Native and Non-native Use of Lexical Bundles in Discussion Section of Political Science Articles

The study of lexical bundles, among types of text analysis, is gaining importance over the others in the last century. The present study employed a frequency-based analysis approach to the use of lexical bundles. The discussion section of 60 political science articles, with corpora around 253,063 words were investigated in three aspects of structure, form, and function of lexical bundles. The p...

متن کامل

The role of Persian causative markers in the acquisition of English causative verbs

     This project investigates the relationship between lexical semantics and causative morphology in the acquisition of causative/inchoative-related verbs in English as a foreign language by Iranian speakers. Results of translation and picture judgment task show although L2 learners have largely acquired the correct lexico-syntactic classification of verbs in English, they were constrained by ...

متن کامل

Investigating Lexico-grammaticality in Academic Abstracts and Their Full Research Papers from a Diachronic Perspective

Development of science and academic knowledge has led to changes in academic language and transfer of information and knowledge. In this regard, the present study is an attempt to investigate lexico-grammaticality in academic abstracts and their full research papers in Linguistics, Chemistry and Electrical engineering papers published during 1991-2015 in academic journals from a diachronic pers...

متن کامل

Automatic Acquisition of Context-Specific Lexical Paraphrases

Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing res...

متن کامل

Etymology, Contextual Pragmatic Clues, and Lexical Knowledge in L2 Idioms Learning

To investigate the effects of etymological elaboration, contextual pragmatic clues, and lexical knowledge on L2 idioms comprehension and production, 60 male intermediate level EFL students in three groups were selected. Each group was randomly assigned to one treatment condition. Group one participants were presented with the etymological explanation of idioms. In group two, the same idioms wer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. UCS

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2009